User-Centric Heterogeneity-Aware MapReduce Job Provisioning in the Public Cloud

نویسندگان

  • Eric Pettijohn
  • Yanfei Guo
  • Palden Lama
  • Xiaobo Zhou
چکیده

Cloud datacenters are becoming increasingly heterogeneous with respect to the hardware on which virtual machine (VM) instances are hosted. As a result, ostensibly identical instances in the cloud show significant performance variability depending on the physical machines that host them. In our case study on Amazon’s EC2 public cloud, we observe that the average execution time of Hadoop MapReduce jobs vary by up to 30% in spite of using identical VM instances for the Hadoop cluster. In this paper, we propose and develop U-CHAMPION, a user-centric middleware that automates job provisioning and configuration of the Hadoop MapReduce framework in a public cloud to improve job performance and reduce the cost of leasing VM instances. It addresses the unique challenges of hardware heterogeneity-aware job provisioning in the public cloud through a novel selective-instance-reacquisition technique. It applies a collaborative filtering technique based on UV Decomposition for online estimation of ad-hoc job execution time. We have implemented U-CHAMPION on Amazon EC2 and compared it with a representative automatedMapReduce job provisioning system. Experimental results with the PUMA benchmarks show that U-CHAMPION improves MapReduce job performance and reduces the cost of leasing VM instances by as much as 21%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hadoop performance modeling and job optimization for big data analytics

Big data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is ...

متن کامل

Big Data Using Hadoop

17ANSP-BD-001 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for data intensive applications. Hadoop, an open source implementationof MapReduce, has been adopted by an increasingly growing user community. Cloud computing service providers such as AmazonEC2 Cloud offer the opportunities for Hadoop users to lease a certain amou...

متن کامل

Toward Optimal Resource Provisioning for Economical and Green MapReduce Computing in the Cloud

Running MapReduce programs in the cloud introduces the important problem: how to optimize resource provisioning to minimize the financial charge or job finish time for a specific job? An important step towards this ultimate goal is modeling the cost of MapReduce program. In this chapter, we study the whole process of MapReduce processing and build 1

متن کامل

Resource Provisioning Framework for MapReduce Jobs with Performance Goals

Many companies are increasingly usingMapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments ...

متن کامل

Towards Optimizing Hadoop Provisioning in the Cloud

Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014